A security researcher bypassed Claude Opus 4.6's policy evaluation with just four short prompts, generating attack code against live infrastructure. Plus 915 files exfiltrated from the sandbox.
GitHub releases the layered defense design of the agent execution platform, and OpenAI releases the instruction hierarchy training data IH-Challenge and model. Responses to prompt injection were received from both infrastructure design and training axes.